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Abstract 



We consider the problem of estimating the value of a linear functional in nonpara- 
metric instrumental regression, where in the presence of an instrument W a response 
Y is modeled in dependence of an endogenous explanatory variable Z. The proposed 
estimator is based on dimension reduction and additional thresholding. The minimax 
optimal rate of convergence of the estimator is derived assuming that the structural 
function and the representer of the linear functional belong to some ellipsoids which 
are in a certain sense linked to the conditional expectation operator of Z given W. We 
illustrate these results by considering classical smoothness assumptions. 
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1 Introduction 

Nonparametric instrumental regression models have attract a growing attention recently in 
the econometrics and statistics literature (c.f. Florens [2003], DaroUes et al. [2002], Newey 
and Powell [2003], Hall and Horowitz [2007] or Blundell et al. [2007] to name only a few). 
To be precise, these models deal with situations where the depends of a response Y on the 
variation of an endogenous vector Z of explanatory variables is characterized by 



for some error term U, and there exists an exogenous vector of instruments W such that 



The nonparametric relationship is thereby modeled by the structural function ip. Typical 
examples leading to such situation are given by error-in-variable models, simultaneous equa- 
tions or treatment models with endogeneous selection. However, it is worth noting that in 
the presence of instrumental variables the model equations (l.la-l.lb) are the natural gen- 
eralization of a standard parametric model (see, e.g., Amemiya [1974]) to the nonparametric 
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Y = ip{Z) + U 




¥.[U\W] = . 



(1.1b) 
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situation. This extension has been introduced first by Florens [2003] and Newey and Powell 
[2003] , while its identification has been studied e.g. in Carrasco et al. [2006] , Darolles et al. 
[2002] and Florens et al. [2007]. It is interesting to note that recent applications and ex- 
tensions of this approach include nonparametric tests of exogeneity (Blundell and Horowitz 
[2007]), quantile regression models (Horowitz and Lee [2007]), or semiparametric modeling 
(Florens et al. [2009]) to name but a few. 

The nonparametric estimation of the structural function (p given a sample of (Y, Z, W) 
has been intensively studied in the literature. For example, Ai and Chen [2003], Blundell 
et al. [2007] or Newey and Powell [2003] consider sieve minimum distance estimator, while 
Darolles et al. [2002] , Hah and Horowitz [2005] , Gaghardini and Scaillet [2006] or Florens 
et al. [2007] study penalized least squares estimator. However, as it has been noticed by 
Newey and Powell [2003] and Florens [2003], the nonparametric estimation of the struc- 
tural function ip generally leads to an ill-posed inverse problem. Precisely, consider the 
model equations (l.la-l.lb), then taking the conditional expectation with respect to the 
instruments W on both sides in equation (1.1a) leads to the conditional moment equation: 

K[Y\W]=¥.[ip{Z)\W]. (1.2) 

Therefore, the estimation of the structural function (/? is linked to the inversion of equation 
(1.2), which is under fairly mild assumptions not stable and hence an ill-posed inverse prob- 
lem (for a comprehensive review of inverse problems in econometrics we refer to Carrasco 
et al. [2006]). 

The instability of the conditional moment equation (1.2) essentially implies that all 
proposed estimators of the structural function (/? have under reasonable assumptions very 
poor rates of convergence. In other words, even relatively large sample sizes may not be 
of much help in accurately estimating the structural function (p. In contrast, it might be 
possible to estimate certain local features of such as the value of a linear functional 
^hi'-p) '■= IE [h(Z)ip(Z)] with respect to some given representer h, at the usual parametric 
rate of convergence. Take as an example the case of an endogenous regressor Z uniformly 
distributed on [0, 1]. In this situation rather than estimating the structural function ip itself 
one may be interested in its average value ip{t)dt over a certain interval [a, b] which equals 
the value ihiv') of ^ linear functional with representer given by the characteristic function 
h = l[a,6]- Then it is of interest to characterize the attainable accuracy of any estimator, 
for example, in terms of the mean squared error (MSE), which obviously depends on the 
representer h and the conditions imposed on ip. It is worth noting, that the nonparametric 
estimation of the value of a linear functional from Gaussian white noise observations is 
a subject of considerable literature (c.f. Speckman [1979], Li [1982] or Ibragimov and 
Has'minskii [1984] in case of direct observations, while in case of indirect observations we 
refer to Donoho and Low [1992], Donoho [1994] or Goldenshluger and Pereverzev [2000] and 
references therein). However, as far as we know this question has not yet been addressed 
in nonparametric instrumental regression, which in general is not a Gaussian white noise 
model. 

The objective of this paper is the nonparametric estimation of the value ihiv) of ^ linear 
functional based on an independent and identically distributed (i.i.d.) sample of (Y, Z, W) 
obeying (l.la-l.lb). In this paper we follow an often in the literature used approach to 
construct an estimator of the value of a linear functional. That is, we replace in ih{^) 
the unknown structural function (p by an estimator. Therefore, let us first motivate the 
estimator of ip (for its asymptotic properties we refer to Johannes [2009]). Suppose for 
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a moment that the structural function can be developed by using only m pre-specified 
functions ei,...,em, say f = X^jLii^^ljCj, where now the coefficients . . . , [99]^ are 
only unknown. Thereby, the conditional moment equation (1.2) reduces to a multivari- 
ate linear conditional moment equation, that is, E[y|l^] = X^jliiv'ljlE [ej(Z)|Ty]. Notice 
that solving this equation is a classical textbook problem in econometrics (c.f. Pagan and 
UUah [1999]). One popular approach is to replace the conditional moment equation by 
unconditional once. Therefore, given m functions /i, . . • ,/m one may consider m uncon- 
ditional moment equations in place of the multivariate conditional moment equation, that 
is, E[Yfi{W)] = E^iMjIE[ej(^)/KW^)], I = l,...,m. Notice that once the functions 
{fi}fLi chosen all the unknown quantities in the unconditional moment equations can 
be straightforward estimated by replacing the theoretical expectation by its empirical coun- 
terpart. Moreover, a least squares solution of the estimated equation leads then under 
very mild assumptions to a consistent and asymptotic normal estimator of the parameter 
vector ([(^Jj)^^. Furthermore, the choice of the functions {fiYj^i directly influences the 
asymptotic variance of the estimator and thus the question of optimal instruments arises 
(c.f. Newey and Powell [2003]). However, our objective is the estimation of the value of a 
linear functional. For simplicity suppose the regressor Z is uniformly distributed on [0, 1] 
and the linear functional is given by the representer h = l[a^b]: that is, ih{^) = Ja'p{t)dt. 
In case ip = Ylf=i[^]j^j value of the linear functional writes ih{p) = X^^i[^]i[v']j 
where the coefficients [h]j := J^ej{t)dt, 1 ^ j ^ m, are known. A natural estimator 
of Ihip) is then defined by replacing the unknown coefficients [(p]j by their least squares 
estimators. This approach is very simple and the estimator can be calculated with most 
statistical software. However, it has a major default, since in most situations there is an 
infinite number of functions {ej}j^i and associated coefficients {[p>]j)j^i needed to develop 
the structural function p. The choice of the functions {ej}j^i reflects now the a priori 
information (such as smoothness) about the structural function p. However, if we consider 
also an infinite number of functions {fi}i^i then for each m ^ 1 we could still consider the 
least squares estimator described above. Notice, that the dimension m plays here the role 
of a smoothing parameter and we may hope that the estimator of the structural function 
ip (hence of the value ihip)) is also consistent as m tends suitably to infinity. Unfortu- 
nately, if pm '■= YlJLi['^m]jej denotes a least squares solution of the reduced unconditional 
moment equations, that is, the vector of coefficients {[pm]j)^i minimizes the quantity 
ET=i{^[YfiiW)] - Er=i/3iE[e,(Z)/;(M^)]}2 over ah (/?,)- 1. Then, p^ converges to the 
true structural function as m tends to infinity only under an additional assumption (defined 
below) on the basis {fj}j^i- In this paper we show under this additional assumption that in 
terms of the MSE a plug-in estimator of ih{p) using a least squares estimator of p based on 
a dimension reduction together with an additional thresholding is consistent and can attain 
optimal rates of convergences. It is worth to note that all the results in this paper are 
obtained without an additional smoothness assumption on the joint density of {Y,Z,W). 
In fact we do even not impose that a joint density exists. 

The paper is organized in the following way. In Section 2 we introduce our basic assump- 
tions and derive a lower bound for estimating the value of a linear functional based on an 
i.i.d. sample obeying the model equations (l.la-l.lb). In Section 3 under certain moment 
assumptions we show in terms of the MSE first consistency of the proposed estimator and 
second its minimax-optimality. We illustrate the general results in Section 4 by considering 
classical smoothness assumptions. All proofs can be found in the Appendix. 
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2 Complexity of local estimation: a lower bound. 



2.1 Basic model assumptions. 

It is convenient to rewrite the moment equation (1.2) in terms of an operator between 
Hilbert spaces. Let us first introduce tlie Hilbert Spaces 

L| = {0 : RP ^ R; ||0||| := E [(t>\Z)] < cx)}, 
Ll, = {^p■.R^^ R; ||V||^ := E [i^HW]] < oo} 

which are endowed with corresponding inner products {(j),(f>)z = IE cj},(f> £ L^, 

and (V',V')vK = E [■!/;( VF)V'(VF)] , V'jV' ^ -^VF' respectively. Then the conditional expectation 
of Z given W defines a linear operator T(/> := E (p £ L^, which maps into 

Ly^. Thereby the moment equation (1.2) can be written as 

g:=E[Y\W]=E[ip{Z)\W]=:Tip (2.1) 

where the function g belongs to . Estimation of the structural function ip is thus linked 
with the inversion of the conditional expectation operator T and, hence called an inverse 
problem. Moreover, we suppose throughout the paper that the operator T is compact which 
is under fairly mild assumptions satisfied (c.f. Carrasco et al. [2006]). Consequently, unlike 
in a multivariate linear instrumental regression model, a continuous generalized inverse of 
T does not exist as long as the range of the operator T is an infinite dimensional subspace 
of L^r. This corresponds to the setup of ill-posed inverse problems (with the additional 
difficulty that T is unknown and, hence has to be estimated). In what follows we always 
assume that there exists a unique solution G L| of equation (2.1), i.e., g belongs to 
the range 1Z{T) of T, and that the null space AA(T) of T is trivial or equivalently T is 
injective (for a detailed discussion in the context of inverse problems see Chapter 2.1 in 
Engl et al. [2000], while in the special case of a nonparametric instrumental regression we 
refer to Carrasco et al. [2006]). Furthermore, we suppose that the representer h of the linear 
functional ^h(') := {'■, h)z of interest is an element of as well. Then it is straightforward 
to see, that the value of the linear functional ihiy^) is identified if and only if h belongs 
to the orthogonal complement Af{T)-^ of the null space Af{T). Hence, for ah /i G L| the 
identification is in particular guaranteed under the assumption of an injective conditional 
expectation operator T. 

2.2 Notations and regularity assumptions. 

In this section we show that the obtainable accuracy of any estimator of the value ihiv^) of a 
linear functional can be essentially determined by additional regularity conditions imposed 
on the structural function ip, the representer h and the conditional expectation operator 
T. In this paper these conditions are characterized through different weighted norms in 
with respect to a pre-specified orthonormal basis {ej}j^i in L^, which we formalize now. 
Given a strictly positive sequence of weights w := {wj)j^i and a constant c > we denote 
for all r G R by J^^r the ellipsoid defined by 

oo 

T^r := G L| : ^w^^\{^,e,)z\^ =: Mir ^ c}. (2.2) 
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Furthermore, let J-^^- := {(/> G : ||(/)||^r < cxd}. It is worth noting, that in case w = 1 we 
have 1 1 1;^ = ||</'||z for all 4* ^ -^z ^^"^ hence the set !F!^ denotes an ellipsoid in L\ which 
does not impose additional restrictions. 

Minimal regularity conditions. Let 7 := (7j)j^i and lo := {ujj)j^i denote two se- 
quences of weights. Then we suppose, here and subsequently, that the structural function tp 
belongs to the ellipsoid J-'i^ for some p > and that the representer h of the linear functional 
ih is an element of the ellipsoid JFJJ for some r > 0. The ellipsoids .7-"^ and J-^ capture all the 
prior information (such as smoothness) about the unknown structural function 99 and the 
given representer h respectively. Furthermore, as usual in the context of ill-posed inverse 
problems, we specify the mapping properties of the conditional expectation operator T. 
Therefore, consider the sequence (||rej||vi/)j^i, which converges to zero since T is compact. 
In what follows we impose restrictions on the decay of this sequence. Denote by T the 
set of all injective compact operator mapping into L^. Given a sequence of weights 
V := {vj)j^i and d ^ 1 we define the subset of T by 

T„'^:={rGT: UWl/d ^ mfw ^ dUC ^<1>^L\\ (2.3) 

Notice that for all T G Tl'^ it follows that^ ||Tej||p^ Vj. Hence, the sequence {vj)j^i has 
to be strictly positive since T is injective. Furthermore, let us denote by T* : — > 
the adjoint of T which satisfies T*ip = K[ip(W)\Z]. If now T ^ T and {Aj,ej}j^i is 
an eigenvalue decomposition of T*T. Then the condition T G is satisfied if and only 
if \j Vj. In other words, in this situation the sequence v specifies the decay of the 
eigenvalues of T*T. In what follows all the results are derived under regularity conditions 
on the structural function y?, the representer h and the conditional expectation operator 
T described through the sequence 7, lo and v respectively. However, we provide below 
illustrations of these conditions by assuming a "regular decay" of these sequences. The 
next assumption summarizes our minimal regularity conditions on these sequences. 

Assumption 2.1. Let 7 := (7j)j^i, ^ ■= 0'''^d v := {vj)j^i be strictly positive 

sequences of weights with 71 = 1, = 1 and vi = 1 such that 7 and oo are non de- 
creasing and V is non increasing. Furthermore, there exists a constant A ^ 1 such that 
Vmswpi^j^mi^j^^j^} ^ Amax(a;~\i;m) for all m G N. 

We shall stress that J-S/ is just an ellipsoid in in case 7 = 1, hence in this situa- 
tion there is not an additional regularity condition on the structural function (/? imposed. 
Furthermore, the last condition in Assumption 2.1 is obviously satisfied with A = 1 if the 
sequence is either monotonically decreasing or increasing. 

2.3 The lower bound. 

In the proof of the next theorem we show that an one-dimensional subproblem captures the 
full difficulty in estimating a linear functional in nonparametric instrumental regression. In 
other words, there exist two sequences of structural functions (pi^n,V'2,n £ -^7, which are 
statistically not consistently distinguishable, and a sequence of representer /i„ G J-'^ such 
that \ihn{^i,n) — ^h„{^2,n)\'^ ^ C6n, where (5„ is the optimal rate of convergence. Moreover, 
we obtain the following lower bound under the additional assumption that there exist error 
terms i = 1,2, such that the conditional distribution of tpi^n — Tipi^n + C^j,n given 

'^We write a Xd 6 if d^^ ^ b/a < d. 
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the instrument W is Gaussian with mean zero and variance one. A similar assumption 
has recently been used by Chen and Reii3 [2008] in order to derive a lower bound for 
the estimation of the structural function ip itself. In particular the authors show that in 
opposite to the present work an one-dimensional subproblem is not sufficient to describe 
the full difficulty in estimating ip. 

Theorem 2.1. Assume an n-sample of (Y, Z,W) from the model (l.la-l.lh) with error 
term U belonging to U„ := {U : EU\W = andElf^lW ^ ct^}, a > 0. Let 7, cv and v 
be sequences satisfying Assumption 2.1. Suppose that the conditional expectation operator 
T associated {Z,W) belongs to %f, d ^ 1, and that supj^K[e'j{Z)\W] ^ rj, 7] ^ 1. Let 
:= m^{n) G N and (5* := € be chosen such that for some A ^ 1 

and 6^ := ^-lu;~] . (2.4) 

n Vm, 

If in addition a is sufficiently large then for any estimator I we have 
sup sup sup |e|I-4((/?)P| ^max(5;, -) min(^ , 

Remark 2.1. In the last theorem the additional moment condition supj^;^ E [e^[Z)\W] ^ rj 
is obviously satisfied if the basis functions {cj} are uniformly bounded (e.g. the trigonomet- 
ric basis considered in Section 4). However, if V denotes Gaussian random variable with 
mean zero and variance one, which is independent of {Z,W), then the additional moment 
condition ensures that for all structural functions of the form (p = a ■ Cj G .F^ with j ^ 1 
and a G M, the error term U := V — (p{Z) + [Tip](W) belongs to for all sufficiently 
large a. This specific case is only needed to simplify the calculation of the distance between 
distributions corresponding to different structural functions. On the other hand, below we 
derive an upper bound assuming that the error term U belongs to and that the joint 
distribution of (Z, W) fulfills additional moment conditions. Obviously in this situation 
Theorem 2.1 provides a lower bound for any estimator as long as a is sufficiently large. 
Furthermore, it is worth noting that this lower bound tends only to zero if is a 

divergent sequence. In other words, in case 7 = 1, i.e., without any additional restriction 
on consistency of an estimator of ihi'-p) uniformly over all 99 G J-^^ is only possible under 
restrictions on the representer h G J-^, that is, a; is a divergent sequence. This obviously 
reflects the ill-posedness of the underlying inverse problem. Finally, it is important to note 
that the regularity conditions imposed on the structural function 99, the representer h and 
the conditional expectation operator T involve only the basis {ej}j^i in L^. Therefore, the 
lower bound derived in Theorem 2.1 does not capture the influence of the basis {fi}i^i in 
used to construct the estimator. In other words, an estimator of the value ih{^) can 
only attain this lower bound if {fi}i^i is appropriate chosen. □ 

3 Minimax- optimal local estimation: the general case. 

3.1 Estimation by dimension reduction and thresholding. 

In addition to the basis {ej}j^i in considered in the last section we introduce now also 
a second basis {fi}i^i in L^. We derive in this section the asymptotic properties of the 
estimator under minimal assumptions on those basis. Precisely, we show first consistency 
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of the proposed estimator under fairly mild additional moment assumptions. In particular, 
we do not impose any regularity assumption on both the structural function ip and the 
representer h. In a second step we suppose that the structural function and the representer 
h belong to some ellipsoid J^S/ and respectively, and that the conditional expectation 
satisfies a link condition, i.e., T G . Furthermore, we introduce an additional condition 
linked to the basis Then under stronger moment conditions we show that the 

proposed estimator attains the lower bound derived in the last section. However, all these 
results are illustrated in the next section by considering classical smoothness assumptions. 

Matrix and operator notations. Given m ^ 1, £rn and denote the subspacc of 
and Ly^ spanned by the functions {6^}^^]^ and {fi}^i, respectively. and (resp. Fm 
and F^) denote the orthogonal projections on E^a (resp. Trn) and its orthogonal complement 

(resp. O^m)^ respectively. Given an operator (matrix) K, \\K\\ denotes its operator 
norm . The inverse operator (matrix) of K is denoted by K~^, the adjoint (transposed) 
operator (matrix) of K by K* . The identity operator (matrix) is denoted by /. [0], 
and [K\ denote the (infinite) vector and matrix of the function (f) G L\, ij; G L^^ and the 
operator K : L| ^ with the entries = {(f), ej), [i/jji = {tp, fi) and [K]ij = {Kej,fi), 
respectively. The upper m subvector and m x m submatrix of [cp] , [ip] and [K] is denoted 
by [(f>]m, [tp]m and [X]™, respectively. Note, that = The diagonal matrix with 

entries v is denoted by Diag(i'). Clearly, [Ejn(p]m = [4>]rn and if we restrict F^KEm to an 
operator from 6^ into Tm, then it has the matrix [K]^- 

Consider the conditional expectation operator T associated to the regressor Z and the 
instrument W. If [e(Z)] and [/(VF)] denote the infinite random vector with entries ej{Z) and 
fj{W) respectively, then [T]^ = E [/(M^)]m[e(-2')]5^ which is throughout the paper assumed 
to be non singular for all m ^ 1 (or, at least for large enough m), so that \T\^ always 
exists. Note that it is a nontrivial problem to determine when such an assumption holds (see 
e.g. Efromovich and Koltchinskii [2001] and references therein). Under this assumption the 
notation is used for the operator from into L^, whose matrix in the basis {ejjj^i 
and has the entries {[r\^)j^i for 1 ^ j, / ^ m and zeros otherwise. 

Definition of the estimator. Let (Yi, Zi, Wi), . . . , (y„, Z„, W„) be an i.i.d. sample of 
(y, Z, W). Since [T]^ = E [/(VI/)]„[e(Z)]^ and [g\rn = E Y[f{W)]rn we constuct estimators 
by using their empirical counterparts, that is, 

n n 

[f]rn := (1/n) 5^[/(W,)]^[e(Z0]m and [?]^ := (1/n) Yi[f{Wi)]rn. (3.1) 
i=l 1=1 

Then the estimator of the linear functional £h{v) is defined by 

£ .= I [h]m[T]^[g]^, if [f]^ is nonsingular and \\[f]^\\ ^ a, 
\ 0, otherwise, 

where the dimension parameter m = m{n) and the threshold a = a{n) have to tend to 
infinity as the sample size n increases. In fact, the estimator ih is obtained from the linear 
functional ihiv) by replacing the unknown structural function ip by an estimator proposed 
by Johannes [2009] , which takes its inspiration in the linear Galerkin approach coming from 
the inverse problem community (c.f. Efromovich and Koltchinskii [2001] or HoflFmann and 
Reifi [2008]). 
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3.2 Consistency. 

We start by providing minimal conditions used to proof consistency of the estimator. More 
specific, we formalize first additional moment assumptions on the basis under consideration. 

Assumption A1. The joint distribution of {Z,W) satisfies sup^g^v E ^ r/^ and 

suPj^eN Var(ej(Z)//(Ty)) < r/^ for some r? ^ 1. 

It is worth noting that the Assumption Al is always fulfilled in case both basis are 
uniformly bounded (e.g. in case of the trigonometric basis considered in Section 4). The 
next assertion summarizes our minimal conditions to ensure consistency of the estimator 
introduced in (3.2). 

Proposition 3.1. Assume an n-sample of (Y, Z,W) from the model (l.la-l.lh). Suppose 
that the error term U satisfies E,U'^\W ^ o"^ with cr > and that the joint distribution of 
{Z,W) fulfills Assumption Al. Let be defined with dimension m := m{n) and threshold 
a := a{n) satisfying a ^ 2||[T]~-'^|| and as n ^ oo that 1/m = o(l) and rr?c? = o(n). // in 

addition sup„gj^||T~-'^Fm,T£';|^|| < oo, then we have E \£h — f'h{f)\'^ = o{l) as n ^ oo. 

The last result shows consistency of the estimator without an a priori regularity as- 
sumption on both the structural function ip and the representer h. However, consistency 
is only obtained under the condition swp^^j^WT^^FmTE:^]] < oo, which is known to be 
necessary to ensure L^-convergence of the least squares solution ipm = Yl'^=iVPrri\jej with 
['^m\m = [T]^[g]m to the Structural function 99 as m — > cxd. Notice that this condition 
involves now also the basis {fi}i^i in L^. In what follows we introduce an alternative but 
stronger condition to guarantee the L^-consistency which extends the link condition (2.3), 
that is, T G TJ^. We denote by T^jj for some D ^ d the subset of given by 

■■= {t e r," : sup||[Diag(T;)]V2[T];,if ^ d}. (3.3) 

Remark 3.1. If fj}j^i is a singular value decomposition of T G T then for all 

m ^ 1 the matrix [T]m is diagonalized with diagonal entries [T]jj = 1 ^ j ^ m. 

Therefore, the link condition (2.3) holds true, that is, T £ T^, if and only if Xj Vj for 
all j G N. Moreover, it is easily seen that sup^gj^ ||[Diag(t;^/2)]^[r]-i||2 ^ d and hence 
the extended link condition (3.3) is fulfilled, that is, T G T^j^'d for all D ^ d. Furthermore, 
the extended link condition equals the link condition (T^ = T^^, for suitable D > 0), if 
[T] is only a small perturbation of Diag(i;"'^/^) or if T is strictly positive (for a detailed 
discussion we refer to Efromovich and Koltchinskii [2001]) and Cardot and Johannes [2008] 
respectively) . 

We shall stress that once both basis {ej}j^i and {fi}i^i are specified the extended link 
condition (3.3) restricts the class of joint distributions of {Z, W) to those for which the least 
squares solution (pm is L^-consistent. Moreover, it is shown in Johannes [2009], that under 
the extended link condition a least squares estimator of (p based on a dimension reduction 
together with an additional thresholding can attain minimax-optimal rates of convergence. 
In this sense, given a joint distribution of (Z, W) a basis {fi}i^i satisfying the extended link 
condition can be interpreted as optimal instruments. However, for each pre-specified basis 
{ej}j^i we can theoretically construct a basis {fi}i^i such that the extended link condition 
is not a stronger restriction than the link condition (2.3). To be more precise, if T G T^, 
which involves only the basis {ej}j^i, then it is not hard to see that the fundamental 
inequality of Heinz [1951] implies ||(T*r)~^/^ej |p vj^ for all j ^ 1. Thereby, the 
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function {T*T)~^/^ej is an element of and hence there exists fj := T{T*T)~^/'^ej G L^r, 
j ^ 1. Then it is easily checked that {fi}i^i is an orthonormal system and moreover a 
basis of the closure of the range TZ{T) of T. Hence by taking any basis of the orthogonal 
complement TZ{T)-^ of 1Z{T) we may complete the orthonormal set {fi}i^i to become a 
basis of L^. Then it is straightforward to see that [T]m is symmetric and moreover strictly 
positive, since {Tej,fi)w = {Tej,T{T*T)-y^ei)w = {{T*Ty/^ej,ei)z for ah j,l ^ 1. 
Thereby, we can apply Lemma A. 3 in Cardot and Johannes [2008] which gives = T^£) 
for all sufficiently large D. □ 

Under the extended link condition (3.3), that is, T £ T^D^ next assertion summarizes 
minimal conditions to ensure consistency. 

Corollary 3.2. Let the assumptions of Proposition 3.1 he satisfied and assume in addition 
that T G "^dD- V^h is defined with threshold a = 2\J D / Vm and dimension m := m{n) such 
that m? / {nvrn) = o(l) and 1/m = o(l). Then we have E \£h — ^h{'^)\^ = o(l), as n ^ oo. 



3.3 The upper bound. 

The last assertions show that the estimator defined in (3.2) is consistent without any 
additional regularity conditions both on structural function and representer. The following 
theorem provides now an upper bound if these conditions are given through ellipsoids J-^ and 
J-'^ for the structural function and the representer respectively together with an extended 
link condition (3.3) for the conditional expectation operator T. Furthermore, the result is 
derived under stronger moment conditions on the basis, more specific, on the random vector 
[e{Z)] and [/(VF)], which we formalize first. 

Assumption A2. There exists ^ 1 such that the joint distribution of {Z,W) satisfies 

(t) sup,g^E[e2(Z)|W^] ^ and supi^^E[f^{W)] ^ r/\- 

(ii) supj^i^f^Ya.i{ej{Z)fi{W)) ^ t?^ q,^^^ 

snp^^i^^E\ejiZ)fiiW)-E[ej{Z)fi{W)]f ^ 8lr]^Y^.v{e,{Z)fl{W)). 

It is worth noting that again any joint distribution of (Z, W) satisfies Assumption A2 
for sufficiently large rj if the basis {e^jj^i and {fi}i^i are uniformly bounded. Here and 
subsequently, we write an < 6n when there exists C > such that an ^ C bn for all 
sufficiently large n S N and a„ ~ 6„ when a„ < bn and 6„ < On simultaneously. 

Theorem 3.3. Assume an n-sample of (Y,Z,W) from the model (l.la-l.lb) with error 
term U G U^j, cr > 0. Suppose that the joint distribution of {Z,W) fulfills Assumption A2 
for some r] ^ 1 and that the associated conditional expectation operator T G T^^, d,D^l, 
where the sequences 7, uj and v satisfy Assumption 2.1. Let m^, := m^,{n) and 5* := (5^(n) be 
such that (2.4) holds for some A ^ 1. Consider the estimator ih with dimension m := m^, 
and threshold o? := nmax(l,4D A/7„j). If in addition T := ^ji^^jj^ < 00, then we have 

sup sup E|4 -4((/7)|2 < dD^AAr]^{(7^ + dDr}pT 

{l + m^/7„. +m3|p(||[f]^- \T\nJ^ > n„./(4L»)) |'^'} 



{max(5;,l/n) +P(||[f]™^- [r]^f > t;„./(4Z))) }. 
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We shall stress that the bound in the last theorem is non asymptotic. However, it does 
not establish the optimality of the estimator compared with the lower bound in Theorem 
2.1. But, the bound in Theorem 3.3 can be improved by imposing a moment condition 
stronger than Assumption A2. To be more precise, consider the centered random variable 
ej{Z)fi(W) — E,[ej{Z)fi(W)]. Then Assumption A2 (ii) states that its 8th moment is 
uniformly bound over j, / S N. In the next Assumption we suppose that these random 
variables satisfy uniformly Cramer's condition, which is known to be sufficient to obtain an 
exponential bound for their large deviations (c.f. Bosq [1998]). 

Assumption A3. There exists rj ^ 1 such that the joint distribution of {Z,W) satisfies 
Assumption A 2 and in addition 

(Hi) supj^i^^E\ej{Z)fi{W) - E [ej{Z)fi{W)]\'' ^ r,''-''klYaT{ej{Z)fi{W)), A: = 3,4, ... . 

It is well-known that Cramer's condition is in particular fulfilled if the random variable 
ej{Z)fi(W) — K[ej{Z)fi(W)] is bounded. Hence in case the basis {ej}j^i and {fi}i^i are 
uniformly bounded it follows again that any joint distribution of (Z, W) satisfies Assumption 
A3 for sufficiently large ry. On the other hand, in Lemma A. 5 in the Appendix we show that 
Assumption A3 implies an exponential bound on the large deviation probability P(||[T]m — 
[T]rn\\^ > Vm/{AD)). Thereby, if the sequences 7, u) and v have the following additional 
properties 

m^(log7mj7mi = 0(1), m^(logmin(t;~5[,u;mJ)7mi = o(l), m^7mi = ^(1) as n ^ 00, 

(3.4) 

where m=K := m^{n) and (5* := (5*(m=K) are given by (2.4), then the large deviation prob- 
ability tends to zero more quickly than max(J*,l/n). In this situation it is not hard to 
see that max((5*,l/n) is the order of the upper bound given in Theorem 3.3. Hence, the 
rate max(5*, 1/n) is optimal and Ih is minimax-optimal, which is summarized in the next 
assertion. 

Theorem 3.4. Suppose that the assumptions of Theorem 3.3 are satisfied. In addition 
assume that the joint distribution of (Z, W) fulfills Assumption A3 and that the sequences 
7, io and V have the properties (3.4). Then, we have 

sup sup E|4 -4((^)|2 < dD^AAr]'^{a^ + dDr}pT max((5;, n'^). 

Remark 3.2. It is worth noting that the bound in the last result is again non asymptotic. 
Furthermore, from Theorem 2.1 and 3.4 follows that the estimator £h attains the optimal 
rate max(5*,n~-^) (hence is minimax-optimal) for all sequences 7, lo and v satisfying both 
the minimal regularity conditions summarized in Assumption 2.1 and the additional prop- 
erties (3.4). We shall emphasize the interesting influence of the sequences 7, lo and v. As 
we see from Theorem 2.1 and 3.4, if the sequence v decreases more quickly to zero then the 
obtainable optimal rate of convergence decreases. On the other hand, a faster increasing 
sequence 7 or leads to a faster optimal rate. In other words, as expected, values of a lin- 
ear functional given by a structural function or representer satisfying a stronger regularity 
condition can be estimated faster. 

Note furthermore, if the eigenfunctions of the operator T are given by {ej}j^i and 
then T G '^dD holds if and only if the corresponding singular values [T]jj = 
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{Tej,fj), j ^ 1, satisfy [Tj'jj Uj- Hence, in this situation the optimal rate obtained 
in the last assertion is linked to the decay of the singular values of T. However, the set T^£, 
contains also operators with eigenfunctions not given by {ej}j^i and {fi}i^i- Then their 
corresponding eigenvalues may decay far slower than the sequence of weights v. Moreover, 
it is straightforward to show, that by using a projection onto the basis {ej}j^i and {fi}i^i 
instead of their eigenfunctions, the obtainable rate of convergence given in Theorem 3.4 
may be far slower than the rate obtained by using the eigenfunctions (see e.g. Johannes 
and Schenk [2009] in the context of functional linear model). However, the rate in Theorem 
3.4 is optimal since the eigenfunctions are generally unknown. 

Finally, since the sequence 7 increases it follows that in Theorem 3.3 and hence also in 
Theorem 3.4 for all large enough n the threshold a = n is used to construct the estimator 
ifi- On the other hand, the choice of the dimension m depends on the sequences 7 and v 
characterizing the regularity conditions imposed on the structural function and the condi- 
tional expectation operator which are in practice not known. Building data driven rules 
that can permit to choose automatically the value of m is certainly a topic that deserves 
further attention and one promising direction is to adapt the selection technique proposed 
in Efromovich and Koltchinskii [2001], Goldenshluger and Pereverzev [2000] or Tsybakov 
[2000]. □ 

4 Minimax- optimal estimation under classical smoothness 
assumptions. 

In this section we shall describe the prior information about the unknown structural func- 
tion (p and the given representer h by their level of smoothness. In order to simplify the 
presentation we follow Hall and Horowitz [2005] (where also a more detailed discussion of 
this assumption can be found) , and suppose that the marginal distribution of the scalar re- 
gressor Z and the scalar instrument W are uniformly distributed on the interval [0, 1]. It is 
worth noting that all the results below can be straightforward extended to the multivariate 
case. However, in the univariate case it follows that both Hilbert spaces and equal 
L^[0, 1], which is endowed with the usual norm ||-|| and inner product (•, •). 

In the last sections we have seen that the choice of the basis is directly linked to 

the a priori assumptions we are willing to impose on the structural function and the represen- 
ter. In case of classical smoothness assumptions it is natural to consider the trigonometric 
basis 

ei := 1, e2j(s) := \/2 cos(27rjs), e2j+i(s) := \/2sin(27rjs), s G [0, 1], j G N, (4.1) 

which can be realized as follows. Let us introduce the Sobolev space of periodic functions 
Wr, r ^ 0, which for integer r is given by 

>V. = {/eifp:/(^)(0) =/(^)(l), i = 0,l,...,r-l}, 

where Hr := {/ G L'^[0, 1] : f""^^ absolutely continuous , /(^) G L'^[0, 1]} is a Sobolev space. 
If we consider now J^^^r given in (2.2) with weight sequence f^i = 1, Wj = j ^ 2, and 
trigonometric basis {cj}, then it is well-known that the subset J^^r coincides with the 
Sobolev space of periodic functions Wr (c.f. Neubauer [1988a,b], Mair and Ruymgaart 
[1996] or Tsybakov [2004]). Therefore, let us denote by := J^^r, c > an ellipsoid in the 
Sobolev space Wr. We use in case r = again the convention that denotes an ellipsoid 
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in L^[0, 1]. In the rest of this section we suppose that the unknown structural function 
and the given representer h are p ^ and s ^ times differentiable, respectively. More 
precisely, the prior information about Lp and h are characterized by the Sobolev ellipsoid 
Wp, p > 0, and WJ, r > 0, respectively. 

Furthermore, to illustrate the general results in Section 3 we consider two special cases 
describing a "regular decay" of the sequence f, which characterizes the mapping properties 
of the associated conditional expectation operator. Precisely, we assume in the following 
the sequence v to be either polynomially decreasing, i.e., fi = 1 and = j ^ 2, or 

exponentially decreasing, i.e., f i = 1 and Vj = exp(— Ijp'^), j ^ 2, for some a > 0. In the 
polynomial case easy calculus shows that any operator T satisfying the link condition (2.3), 
that is T G T^^, acts like integrating (a)-times and hence it is called finitely smoothing (c.f. 
Natterer [1984]). On the other hand in the exponential case it can easily be seen that T G 
implies TZ{T) C Wr for all r > 0, therefore the operator T is called infinitely smoothing (c.f. 
Mair [1994]). It is worth noting that these are the usually studied cases in the literature (c.f. 
Hall and Horowitz [2005], Chen and Reii3 [2008] or Johannes et al. [2007] in the context of 
nonparametric estimation of the structural function itself). However, the general results in 
the last section can be also applied considering more sophisticated sequences. Nevertheless, 
since in both cases the minimal regularity conditions given in Assumption 2.1 are satisfied, 
the lower bounds presented in the next assertion follow directly from Theorem 2.1. 

Theorem 4.1. Under the assumptions of Theorem 2.1 we have for any estimator I 
(i) in the polynomial case, i.e. vi = 1 and Vj = j ^ 2, for some a > 0, that 

(a) in the exponential case, i.e. vi = 1 and vj = exp(— |jp"), j ^ 2, for some a > 0, that 
sup^e«^sup^eW,^sup,ewj{lEK^-4(</^)n > (log n)-(f+^)/". 

Let us introduce now the second basis {fi}i^i, which is in this section also given by 
the trigonometric basis. In this situation the additional moment conditions formalized in 
Assumption A1-A3 are automatically fulfilled since both basis {ej}j^i and {fi}i^i are uni- 
formly bounded. However, we suppose that the associated conditional expectation operator 
T satisfies the extended link condition (3.3), that is, T £ T^j^)- Thereby, we restrict the set 
of possible joint distributions of (Z, W) to those having the trigonometric basis as optimal 
instruments. On the other hand, if the dimension m and the threshold a in the definition 
of the estimator 4 given in (3.2) are chosen appropriate, then by applying Theorem 3.4 
the rates of the lower bound given in the last assertion provide up to a constant also the 
upper bound of the risk of which is summarized in the next theorem. We have thus 
proved that these rates are optimal and the proposed estimator is minimax-optimal in 
both cases. 

Theorem 4.2. Assume an n-sample of (Y,Z,W) from the model (l.la-l.lh) with error 
term U G U^, cr > 0, and associated conditional expectation operator T G TfiD, d,D ^ 1. 

Consider the estimator £h given in (3.2) 
(i) in the polynomial case, i.e. vi = 1 and Vj = j ^ 2, for some a > 0, with 

m ~ -n} I ^"^P^"^"-) and threshold a ^ n. If in addition p ^ 3/2 then 

sup^g^P,,g^.{E 14 - < max(n-(P+^)/(P+-), n-i), 

(a) in the exponential case, i.e. fi = 1 and Vj = exp(— |jp"), j ^ 2, for some a > 0, with 
m ~ (log n) -'^Z'^^") and threshold a ^ n. If in addition p ^ 3/2, then 
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sup^eW,^hewj{E 14 - 4(9^)^} < (log n)-(P+^)/«. 



Remark 4.1. We shall emphasize the interesting influence of the parameters p, s and a 
characterizing the smoothness of (p, h and the smoothing properties of T respectively. As 
we see from Theorem 4.1 and 4.2, if the value of a increases the obtainable optimal rate 
of convergence decreases. Therefore, the parameter a is often called degree of ill-posedness 
(c.f. Natterer [1984]). On the other hand, an increasing of the value p + s leads to a faster 
optimal rate. In other words, as expected, values of a linear functional given by a smoother 
structural function or representer can be estimated faster. Moreover, in the polynomial case 
independent of the imposed smoothness assumption on the slope parameter (only p ^ 3/2 is 
needed) the parametric rate is obtained if and only if the representer is smoother than 
the degree of ill-posedness of T, i.e., s ^ a. The situation is different in the exponential case. 
As long as the representer h is only finitely times differentiable, then due to Theorem 4.1 
and 4.2 the optimal rate of convergence is logarithmic. However, if we restrict the class of 
representers even more, e.g. by considering with weights loi := 1, ujj = expdjp"^), j ^ 2, 
which contains only analytic functions given q > 1 (c.f. Kawata [1972]). Then faster rates 
are possible. Again independent of the imposed smoothness assumption on the structural 
parameter (again p ^ 3/2 is needed) the parametric rate is obtained if and only if 
the representer h is smoother than the degree of ill-posedness of T, e.g., q ^ a. Finally, 
in opposite to the polynomial case in the exponential case the smoothing parameter m 
does not depend on the value of p. It follows that the proposed estimator is automatically 
adaptive, i.e., it does not depend on an a-priori knowledge of the degree of smoothness of 
the structural function ip. However, the choice of the smoothing parameter depends on the 
smoothing properties of T, i.e., the value of a. □ 



A Appendix 

A.l Proofs of Section 2. 

Consider the conditional expectation operator T associated to the regressor Z and the 
instrument W, then E |[rej](Ty)p = ||Tej||^, j G N. Therefore, if the link condition (2.3), 
that is r G 72', is satisfied, then it follows that E |[rej](iy)|2 Vj, for all j £ N. This 
result will be used below without further reference. We shall prove at the end of this section 
the technical Lemma A.l used in the next proof. 



Proof of the lower bound. 

Proof of Theorem 2.1. We show below for any estimator £ only based on an n-sample 
of (Y,Z,W) from the model (l.la-l.lb) the following two lower bounds: 

sup sup sup E|^-4((y9)|2 ^ min(-?- , (A.l) 

UeU^ ^(zJ^p heJ^;:, 4 A \2d IXJ 

sup sup sup ^\i-ih(p)\^ ^ - 7 min(-^ , p). (A.2) 

U&J„ ^eJ^P hey^Z ^ ^-^^ ^ 

Consequently, the result follows by combination of these two lower bounds. 

Proof of (A.l). Consider (Z, W) with associated T G TJ'. Define the structural function 
fit := [p*]m,emt, where satisfies (2.4) for some A ^ 1 and [</'*]m* is given in (A. 7) 
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(Lemma A.l). Then from (A. 9) in Lemma A.l follows G and thus ^pf^ := Qip^, G 
with Q G {—1,1}. Let y be a Gaussian random variable with mean zero and variance one 
{y ~ AA(0,1)) which is independent of {Z,W). Then Uq := [T^i\w) - ip'f\z) + V 
belongs to Ua- for all sufficiently large a, since EUe\W = and E [U^\W] ^ 8{16p'^r] + 3}. 
Consequently, for each 9 i.i.d. copies {Yi, Zi,Wi), 1 ^ z ^ n, of {Y,Z,W) with F := 
+ form an n-sample of the model (l.la-l.lb) and we denote their joint distribution 
by Pe . In case of Pq the conditional distribution of Yi given Wi is then Gaussian with mean 
9[Tip^]{Wi) and variance 1. Then, it is easily seen that the log-likelihood of Pi with respect 
to P-i is given by 



^ i=l i=l 



Its expectation with respect to Pi satisfies E pJlog(dPi/(iP_i)] = 2n||rc/?*|p ^ 2n(i[(^*]^_^fm* 
by using T G TJ^. In terms of KuUback-Leibler divergence this means KL(Pi,P-i) ^ 
2dn[ip^]l^^Vm,- Since the Hellinger distance i?(Pi,P_i) satisfies ^ KL{Pi,P^i 

it follows from (A. 9) in Lemma A.l that 

H\PuP^i) ^ 2dn[^,]l,^v^, ^ 1. (A.3) 

Consider the Hellinger affinity p{Pi, P-i) = f \/ dPidP-i then we obtain for any estimator 
i and for all h G J-^ that 

n(P P X f\l^Mihi flplT^^ [II^Mp^ rrw^ 

^ J 2|4(^*)| V^^^^^-^ + y 2M^.)\ ^"^^"^-^ 

K^-4(¥'l'^)r^\V2 , . r\i-i,(^i-^^)\^ .1/2 



Due to the identity p{Pi,P_i) = 1 - lH^{Pi,P^i) combining (A.3) with (A.4) yields 

{EpJ£-4(</'1'^)|' + Ep_J|-4((/^1~'^)|'} ^ ^|4(v'*)|'. (A.5) 

Consider now the representer h^: := [/i*]m*em,5 where [/i*]^, '■= T/^m,- Then by construc- 
tion K G and 14, (9?*)!' = [h*?mM]L ^ (^/^) min(l/(2d), p/A) J* by using (A.9) in 
Lemma A.l. From (A.5) together with the last estimate we conclude that 

sup sup supE\i-eh{v)f ^ sup EpJ£^-4,(vl^^)|^ 

^ ^{epJ£-4.(4'^)P + iEp_J^"-4,(<^1"'^)P} 

^ (1/4) ^ ('5:/4)(r/A) min(l/(2d),p/A), 

which proves (A.l). The proof of (A. 2) is similar to the proof of (A.l), but uses (A. 8) 
in Lemma A.l rather than (A.9). To be more precise, we define the structural function 
:= [9?*]iei, and the representer [h^] := [/i*]iei, where and [h^]i are given in (A. 6) 
(Lemma A.l). Then by following along the same lines as in the proof of (A.l) we obtain 
(A. 2), which completes the proof. □ 
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Lemma A.l. Consider sequences v, 7 and uj satisfying Assumption 2.1. Let and 5* he 
such that (2.4) holds true for some A ^ 1. If we define 



[K]\:=T, [ip^]\:=—, with ^1 := min | p J> , (A.6) 



n 



2d" 



[h*\m, ■= an-d ['P*]m, ■= ^ where ^ := min <7r;;,^> ■ (A.7) 

UJm, n-Vm, [ 2(i A J 

Then we have 

2dnvi['^,]l ^ 1; 7i[vJ*]? ^ p; [K]l[ip,]l ^ (1/n) r min(l/(2(i), p); (A.8) 
2dnvm,[^,\l^ ^ 1; lmAv*\l^. ^ P; ^ 5:(r/A)min(l/(2d),p/A). (A.9) 

Proof. We only prove (A.9). The proof of (A.8) follows analogously and we omit the 
details. The first inequality in (A.9) is obtained trivially by using the definition of ^. The 
second and third inequality in (A.9) follows from the definition of m-t and 5* given in (2.4), 
i.e., -i^A'P*?m, ^ and [KM.S'pAm, = ilmj {nv^,)) 8^ ^ ^{t/A)6*^, together with 
the definition of ^, which completes the proof. □ 

A. 2 Proofs of Section 3. 

We begin by defining and recalling notations to be used in the proofs of this section. Given 
m > 0, denote ipm ■= ^"jLii^mljej with [ipm]m = [T]:^[g]m which is well-defined since [TJ^ 
is non singular. Then, the identities [T{ip - (pm)]m = and [ipm - Em^]m = [T]^[TE:^ip]rn 
hold true. Furthermore, let := [Tjm — [T]m and define vector [B]m and [S]m by 

^ n 1 " 

[^]^- --Y.^iM^i)^ := -5]/,(M^i)M^»)-bm]ye]^(Z0}, l^j^m, (A.IO) 

^ i=l ^ i=l 

where [^Jm — [7']m[¥'m]m = [-B]2n-|-[5]m,- Note that E [Bjm = due to the mean independence, 
i.e., E (C/|M^) = 0, and that E [Sjm = [T'v'lm — [T(pm\rn = 0. Moreover, let us introduce the 
events 

n := {||[f]^i|| ^ a}, n,/, := {||[H]„||||[r]^i|| ^ 1/2} 

n'^ := {\\[f]^'\\ > a} and Q^^, = {\\[EU\\\[T]^'\\ > 1/2}. (A.ll) 

Observe that ^1/2 C n in case a ^ 2||[r]^i||. Indeed, if ||[H]m||||[T]^^|| ^ 1/2 then the 
identity [T]m = [r]m{-^ + [2^]m^['='n]m} implies by the usual Neumann series argument that 
||[f]^i|| ^ 2||[r]^^||. Thereby, if a ^ 2||[r]^i||, then we have 17i/2 C n. These results wih 
be used below without further reference. 

We shall prove in the end of this section four technical Lemma (A. 2 - A. 5) which are 
used in the following proofs. 

Proof of the consistency. 

Proof of Proposition 3.1. Let £^ := 4(¥'m)l{||[?]m^|| ^ a}- Then the proof is based 
on the decomposition 

E 14 - ^ 2{E 14 - +^\ih- 4(^^)1'}. (A.12) 



15 



Under the assumption a ^ 2||[T]^^|| we show below that for all n ^ 1 

E 14 - ^ 2\\hf .'^.{n-y-ipm\\' + a^), (A.13) 

n 

E \il - ^ 2||/if [y - ^^f + . rj ■ ^^}. (A.14) 

Moreover, we have \\(p — (pm\\ = o(l) as m ^ cxd, which can be realized as follows. Con- 
sider the decomposition \\ip — ipm\\ ^ H-B^l^i/^ll + ||£'mV — V^m\\, where ||£';|^(/9|| = o(l) by 
using Lebesgue's dominated convergence theorem. The consistency of ipm follows then from 
\\Em^ — ^m\\ ^ ll-E'^y'll sup„||r~^FmTS^|| = 0( || £'^99 1| ) . Consequently, the conditions on 
m and a ensure the convergence to zero as n — > cxd of the bound given in (A.13) and (A.14), 
respectively, which proves the result. 

Proof of (A.13). By making use of the identity [g]rn — [T]rn['^rn\m = [B]rn + [S]m the 
Cauchy-Schwarz inequality and ||[T]~-^||ln ^ a imply together 

E|4-4|2 ^ \\hf.a^.E\\[B]^+[SU\^. 

and hence (A.13) follows from (A. 21) and (A. 22) in Lemma A. 2. 
The estimate (A.14) follows from the decomposition 

E \ii - 4(^)p ^ 2\\hf{y - + y^fp{Q-)], 

where we claim that P(0'=) ^ A'qm'^\\[T]:;^f /n ^ ■qm? a^/n. Indeed, since a ^ 2||[T]^-'^|| it 
follows that C ^112 thus by applying Markov's inequality we obtain from (A. 23) in 
Lemma A. 2 the estimate, which completes the proof. □ 

Proof of Corollary 3.2. By combination of the identity [(/7m.--E'm93]m = [T\^[T E:^ip\rn 
and the estimate (A. 31) in the proof of Lemma A. 4 with 7 = 1 the extended link condition 
(3.3), that is T G implies WT^"^ FmT E:j;^\\'^ = supj|^j|=xll^m'/' ~ V'mlP ^ D d. Moreover, 

2||[^]m^|| ^ 2||[Diag(i;)]m^''^||||[Diag(i;)]^^[T]^^|| ^ 2^D/vm = a since v is non increasing. 
By using these estimates the result follows directly from Proposition 3.1. □ 



Proof of the upper bound. 

Proof of Theorem 3.3. Our proof starts with the observation that by using the defini- 
tion (2.4) of m=K, that is, 1/fm* ^ nA/yrn,, the condition on the dimension m = implies 
that m^/{nvm) ^ A and that the threshold satisfies both = nmax(l,4L' A/7m) ^ 
4||[r]-i|p and o^/n ^ 4I?A. On the other hand, we show below under the condition 
a ^ 2||[T]~^|| the following two bounds: 



E 14 - < (C/n) ||[Diag(a,«)l„'ll Mi D „* (a^ + v,„||2) 



Vmn n 

(A.15) 

E\it-ihy)\''^2[\\hfyrnfP{^\/2)+l^^ 

(A.16) 
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for some generic constant C > uniformly for all n S N, where Hi/JmlP ^ — (/JmP + 

^ 2{2Dd + 1}||(/?||2 ^ QDdW^f^ and \\^ - ^mf^ ^ 2Dd||(/?||2 due to (A.29) in 

Lemma A.4. Thus, by using 9^1^^ C > Vm/{^D)} and ||[Diag(wu)]rn^/^f ^ 

mayiiuj^ ,Vm) (Assumption 2.1) it follows again from the decomposition (A. 12) by 
combination of (A. 15) and (A. 16) that uniformly for all ip G J-^ and h G 

3 

E 14 - 4^1^ ^ C(iZ)2 A7?^{cj2 + dL>r} |l + A — + AmV(^V2)l^^^ 

• |(nr;m)"^max(a;~\i;m) + max(tj;;\ -u^) +P(||[H]m||^ > i;m/(4D))|. 

The result follows now from {{nvm,)~^ + 7mi} ™3^('^mi > ^m, ) =^ 2 A max((5* , 1/n) by using 
the definition of 5* given in (2.4). 

Proof of (A.15). By making use of the identity \g\rn - [T]m[v5m]m = [B]m + [S]m it 
follows 

4 -4 = [h]L{[T]-J + m^Hm™ - [T]^)[f]^l}{[^]m + [5]rn}lc 

where (A.24) and (A.25) in Lemma A.3 with := V|| || imply together 

mhW^ mrn + [SU}? ^ (2/n) • \\[hUT]-2r r?2 {a'+T\\^- ^mt,). (A.17) 
On the other hand we show below that there exists a generic constant C > such that 



Consequently, the inequality (A.15) follows by combination of (A.17) and (A. 18) together 
with mjr]^r ^ \\[hYJ,Dlag{v)]-^'^fD ^ \\h\\l \\[I}lag{ojv)]-^"fD sincejr E Tj;^. 

The proof of (A. 18) starts with the observations that T G 7^^^ implies ||[T]~^||ln^^2 ^ 
2||[^]m"'^|| ^ '^\J D jvm and that ||[T]~-^||ln ^ a. By using these estimates we obtain 



/2 



Consequently, (A.26), (A.27) and (A.28) in Lemma A.28 imply together (A. 18). 
Proof of (A. 16). Following along the lines of the proof of (A. 14) we obtain 

E \tl - 4((^)P ^ 2{|(/l, - ^m)? + \\hf\\^J\^P{Sl\i^)}. 

Then, due to G JT^ and h G the estimate (A. 30) in Lemma A. 4 implies (A. 16), which 
completes the proof. □ 



17 



Proof of Theorem 3.4. The result follows from Theorem 3.3 since m^7„][ = 0(1) by 
using the additional properties (3.4) and 



p( \\[fu, - [r]„j|2 > vmJim) ^'^ = 0(1), (A.19) 



P[p'W-\T\rnA? >jjf)= 0(max(5:,l/n)), (A.20) 

which can be realized as follows. Consider first (A.20). From the definition of m^, follows 
nvm,^m^ ^ A^^. By using this estimate together with (A. 35) in Lemma A. 5 we conclude 



1/4 



^ 2i/^exp{-(nt;„,m;2)/(80Dr/2) + (7/2) log m J 

^ 2V^exp|-I^f \ - H^m}^)}, 

Consequently, (A.19) follows also from the conditions (3.4), that is, mfj^^^ = 0{1). Con- 
sider (A.20). From the definition of moreover follows min((5*~"^, n) ^ Ajm, ™i^(''-'m^ ) "^m* 
This estimate and nVm,Mm.t ^ 1/A together with (A. 35) in Lemma A. 5 implies now 

^m{5r\n)P{\\[f]rn^-[T]rn£ > U^Jim) 

^ 2exp{ — (nt;m^m;7^)/(20Dr/^) + 21ogm* + logmin(5*~^, n)} 
7m, /I "T-* 2 log + log A 



^ 2exp|- 



\20L'?72A 7^, m* 

ml log 7m. mi log min(t; 



7m, 7n 



)}■ 



Thus, the estimate (A.20) follows again by using the additional properties (3.4), which 
completes the proof. □ 

Technical assertions. 

The following paragraph gathers technical results used in the proof of Section 3. 

Lemma A. 2. Suppose that the error terra U satisfies K[U'^\W] ^ a'^, a > and that the 
joint distribution of (Z, W) fulfills Assumption Al. Then for all m £N we have 

E\\[B]rnf ^{m/n)-a^ (A.21) 

■ T] ■ \\ip — ipmW , (A. 22) 

E||[HU||2^(mVn)-r/. (A.23) 

Proof. Proof of (A.21) and (A.22). Consider E = EJli^KV^) E7=i Uifj{Wi)\'^. 

By using the mean independence (Assumption Al), i.e., E[C/|VK] = 0, it follows that the 
random variables {Uifj{Wi)), 1 ^ i ^ n, are i.i.d. with mean zero, thus E||[i?]m|P = 
{m/n)E\Ufj{W)\^. Thus, E {U^\W) ^ andE ff{W) = 1 imply (A.21). Consider (A.22), 
where for each 1 ^ j ^ m the random variables {{ip{Zi) — [ipm\m[^]rn{Zi)}fj{Wi)), 1 ^ i ^ n, 
are i.i.d. with mean zero. Thus E || [SJ^f ^ (m/n) sup^ E |{v3(Z) - [^Pm]ln[e]rn{Z)}fj{W)\'^ 
and, hence (A.22) follows from Assumption Al, i.e., supjgpjE [fJ(W)\Z] ^ 77, together with 

E{^{Z) - [ipm]Ue]rniZ)y = \\V " frnf- 
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Proof of (A.23). Let 1 ^ j, Z ^ m. Then {ej{Zi)fi{Wi) - [r]^-;), 1 ^ i ^ n, are 
i.i.d. with mean zero and E = n^^K {ej{Z)fi{W) — [Tj^y}^ ^ n~^r] by Assumption Al. 
Consequently, (A.23) follows from the estimate E ^ SJ/=i ^ [^Ijn '^tii'^h completes 

the proof. □ 



Lemma A. 3. Let := {z G M™ : z*z = 1}. Suppose that U G a > and that the joint 
distribution of{Z, W) satisfies Assumption A2. If in addition ip G J-j with T = X^^i 7j~^ < 
oo, then there exists a constant C > such that for all m 

sup {E |z* [B]rn\^} ^ (1/n) a^, (A.24) 
sup {E |z* [5] J2} ^ (1/^) ^2 p _ ^^||2 (^_25) 

E||[5y|4^C-((m/n).a2.r?2)', (A.26) 
E \\[S]rnf ^ C • ((m/n) .rj^.r-y- (A.27) 

E||[S]^f <C-((mVn)-r?2)'. (A.28) 

Proof. Consider (A.24). Let z G S™. By using the mean independence, i.e., E[C/|T^] = 0, 
it follows that the random variables (UiY^J^^ Zjfj(Wi)), 1 ^ i ^ n, are i.i.d. with mean 
zero. Therefore, we have E\z^[B]^\'^ = {l/n)E\UYlf=iZjfj{W)\'^. Then (A.24) follows 
from E(?72|VF) ^ (E (C/^jT^)) ^2 ^ and E[fj{W)fi{W)] = 6ji with 6ji = 1 if j = / and 
zero otherwise. Consider (A.25). Since {fj{W){if{Z) — [^m]m[^]rni^)}) mean zero, it 
follows that {{ip{Zi) — ['^m]m[^]m{Zi)} X^JLi ^jfji^i))^ 1 ^ « ^ ^^, are i.i.d. with mean zero. 
Thus, E\z'[SU^ = {l/n)E\{^{Z) - b„^]LNrn(^)} E,"! Then (A.25) follows 

from Assumption A2 (i), i.e., supi^^E[\ei{Z)\'^\W] ^ r/^, and E [fj{W)fi{W)] = 6ji. Indeed, 
by using the Cauchy-Schwarz inequality and that EjeN^J^ = F < cx) we have 

m m 

j=i /eN j=i 

m 

ZeN j=i 

Proof of (A.26). Since E||[S]„f ^ mJ2]Li^\{'^/n)J2i=iUifj{Wi)\^, where for each 
1 ^ j ^ m the random variables {Uifj{Wi)), 1 ^ i ^ n, are i.i.d. with mean zero. It follow 
from Theorem 2.10 in Petrov [1995] that E |(l/n) Y."=i Uifj{Wi)\* ^ Cn'^E \Ufj{W)\'^ for 
some generic constant C > 0. Thus, by using E(?7^|l^) ^ o"^ and supjgpjE |/j(VF)|^ ^ r/'* 
(Assumption A2 (i)), we obtain (A.26). The proof of (A.27) follows in analogy to the proof of 
(A.26). Observe that for each 1 ^ j ^ m, {{ip{Zi) — ['^m\m[^]rn{Zi)} fj (Wi)) , 1 ^ i ^ n, are 
i.i.d. with mean zero and E \{ip{Zi) — [^m]m[^]rni^i)} fj(^i)\'^ ^ r]'^T'^ — (/Jmll^, which can 
be realized as follows. Since [T{ip—ipm)]j = it follows that {(p{Z) — [ipm]ln[^]rni^)}fji^) — 
EzeN^ ~ ^■m]i{ei{Z)fj{W) - [T]j^i]. Furthermore, by using Assumption A2 (ii), i.e., 
su.]ij^ifzfiE\ei[Z)fj{W) — ^ 2?7'^, the Cauchy-Schwarz inequality implies 



ZeN 



< \\ip-^m\\^j:Hr]\ 
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Proof of (A. 28). The random variables {ei{Zi)fj(Wi) — [T]j^i), 1 ^ i ^ n, are i.i.d. with 
mean zero for each 1 ^ j,l ^ m. Hence, Theorem 2.10 in Petrov [1995] together with 
Assumption A2 (ii), i.e., supj^i^j^E\ei{Z)fj{W) - [T]jy|8 ^ 8\rj^, imphes ^J}^! E ^ ^ 
Cm?n~'^rf . Consequently, (A. 28) follows from the estimate E ^ mn^ Yl^i=i ^ t^lji' 

which completes the proof. □ 

Lemma A.4. Let g = Tip and denote ipm '■= [T]^[g]m, m G N. If T G '^dD '^^^ ^ -^7' 
then for all ^ s ^ 1 we obtain 

sup{-f}^'\\ip-ipmf^s} !^2Ddp. (A.29) 

meN 

If in addition h G J^l^, then under Assumption 2.1 we have 

sup{7mmin(a;m,t;~^) \{h,if- ipm)?} ^ 2KDdpT. (A.30) 

mGN 

Proof. Consider the decomposition 

\W - ^mW^^s ^ 2{\\ip - Em'^W^s + \\Em^ - ifmW'^s} . 

Since (7|~^) is monotonically decreasing it follows that \\ip — Em'-pW'^a ^ 7^""^ Il¥'ll7) while 
we show below that 

\\Em^ - ^ Dd-f'-^ (A.31) 

Consequently, by combination of these two bounds the condition Lp G JT^, i.e., ^ p, 

implies (A.29). Consider (A.31). Since T G T^j^, i.e., sup„gi^|| [Diag(i;)]^^[r]^i f ^ D 
and ||r/||2 ^ dWfWl for ah / E L|, the identity '[^^(/^ - 99^]^ = -[T\^[T E^ip]^ implies 
WEmip) — 'P'mWt ^ -^ll^-^m'/'lP ^^"^ hence 

ll-E'mV' - 95m||S ^ Dd-i:;^Vm\W\\'^^ (A.32) 



because (7^ ^Vj) is monotonically decreasing. Furthermore, since {ijVj ^) is monotonically 
increasing we have \\Em'P — V'mll^s ^ Im'^^ \\Em'P — '^■mWt- The inequality (A.31) follows 
now by combination of the last estimate and (A.32). 

Proof of (A.30). By applying the Cauchy-Schwarz inequality we have 

\{h,^- E^^)\'' ^ u:-^^^ WhfMt, (A.33) 

and by using (A.32) it follows 

\l,h,Em^ - ^ ||[Diag(^)]^i/2[Diag(t;)]„i/2||2||(^^^ _ ^^)||2 

{ sup l/(u;jt;j)}-Dd7.^^t;m (A.34) 



Since under Assumption 2.1 there exist a constant A such that for all m G N holds 
Umsup]^^j^^{l/(ci;jfj)} ^ Amax(u;~^, fm) the assertion (A.30) follows from (A.33) and 
(A.33), which completes the proof. □ 

Lemma A. 5. Suppose that the joint distribution of {Z,W) satisfies Assumption A3. If in 
addition the sequence v fulfills Assumption 2.1, then for all m gN we have 

P{\\ [Ejrnf > Vm/{^D)) ^ 2 exp{-{nvm/m^) / {20Dr]^) + 2 log m}. (A.35) 
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Proof. Our proof starts with the observation that for all j,l EN the condition (iii) in 
Assumption A3 implies for all f > 

P{\ej{Z)fi{W) -E[ej{Z)fi{W)]\ ^ t) ^ 2 exp{-tV(4n Var(e,(Z)/KTy)) + 27?t)}, 

which is just Bernstein's inequality (a detailed discussion can be found, for example, in 
Bosq [1998]). Therefore, the condition sup^-^^gpj Var(ej(Z)/i(T1^)) ^ ry^ (Assumption A2 (ii)) 
implies now for all t > 

sup P{\ej{Z)MW) -E[ej{Z)MW)]\ ^ t) ^ 2 exp{-tV(W + 2??t)}. (A.36) 

On the other hand, it is well-known that m~-^||[^]m|| ^ maxi^j ^^^ ^-^y Tn x Tn 

matrix [A]m- Combining the last estimate and (A.36) we obtain for all t > 

m 

P{m~^[EU\ ^t)^Yl P{\ej{Z)fi{W) -E[e,{Z)fi{W)] \ ^ nt) 

< 2exp{-(nt^)/(477^ + 2??*) + 21ogm}. 

From the last estimate it follows now 

PmUf > VnJim) ^ 2exp{-(ni;„/m2)/(4D(V + (ry/Di/2)(^i/2/^))^21ogm}, 
which together with Assumption 2.1, that is, jra ^ riD^I"^ ^ implies the result. □ 

A. 3 Proofs of Section 4 
The lower bounds. 

Proof of Theorem 4.1. Observe that = TS^ and WJ = Tl, with weights 7 = (7j)j^i 
and u; = (wj)j^i given by 71 := l,7j := and cJi := 1, LJj := j ^ 2, respectively. 
Obviously, the sequences 7, uj and v given in (i) hy v = \^Vj = and (ii) by f = 1, = 

exp(— IjP"), j ^ 2, satisfy Assumption 2.1. Furthermore, in case (i) we have l/(7m»^^m») = 
^2a+2p^ It follows that and given in (2.4) of Theorem 2.1 satisfies m* ~ 7T,i/(2p+2a) ^j^d 
(5* ~ 7T,~(p+*)/(p+") respectively. On the other hand, l/(7m,i^mH,) = exp(m^'^) implies in 
case (ii) that m* ~ (logn)"*^/*^^"^ and (5* ~ (log n)^^~^^/". Consequently, the lower bounds in 
Theorem 4.1 follow by applying Theorem 2.1. □ 

The upper bounds. 

Proof of Theorem 4.2. Observe that in both cases the condition (3.4) is satisfied if 
•p ^ 3/2. Since the condition on m and a ensures in both cases that m ~ and a ~ n 
(see proof of Theorem 4.1) the result follows from Theorem 3.4. □ 
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